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SOFTWARE FOR ANALYZING DATA CONTAINED IN OUTPUT FILES 
CREATED BY THE SPATL AND MLTCRP ROUTINES OF THE 
ASSESSMENT SOFTWARE SYSTEM 

INTRODUCTION 

The output files from the Accuracy Assessment routines SPATL and MLTCRP con- 
tain information about individual Procedure 1 processings of Large Area Crop 
Inventory Experiment (LACIE) blind sites. To analyze this data and aggregate 
the results over many blind sites, a program was developed to sort the data, 
and was used as a basis for other programs to investigate analyst dot labeling 
accuracy, clustering purity, and classification accuracy. This memoranuuia 
describes the operation of this software. 

BASIC PROGRAM FOR SORTING OUTPUT FILES - ANALYZE 

Program ANALYZE uses the header information in the output files from MLTCRP or 
SPATL to sort the data on the basis of state, segment number, processing date, 
number of acquisitions, and dot type. (A description of the contents of these 
output files is contained in refs. 1 and'2.) ANALYZE requires two data files 
as inputs: DATSEL.DAT, which contains the selection criteria for the run, 
and INPUT.DAT, which contains the name of the output files to be used and the 
range of output file version numbers to be accessed. The input file INPUT. IWT 
can contain any additional information necessary for the processing of the 
data. The file must contain a minimum of two lines. The first line is the 
name of the output files to be accessed in the form DBO: [110,6] MCRP. The 
device and user identification code (UIC) are optional if they are the same as 
the device and UIC in which the task resides. The second line contains the 
starting and ending version numbers in an octal format (04, IX, 04). Any 
additional information may follow this second line. 

Input file DATSEL.DAT contains two lines for each selection criterion. The 
first line is the general selection criterion, and the second line is the spe- 
cific selection basis. A sample data set using each of the criteria follows. 
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SEGMENT 

5 1005, 1007,19.??, 1215, 1515 
STATE 

4 C0.ND,SD,MT 
NUMBER OF ACQ. 

3 

DATE 

7200,7300 

DOTS 

3 1,2,3 
(blank line) 

The blank line indicates the end of a data set. More than one set of selec- 
tion criteria can be included in the data set, each separated by a blank line. 
A line containing "END" must follow the blank line after the last set of 
selection criteria. If no selection is to be made for a particular criterion, 
it is not included in the data set. The limits on the selection criteria are: 
(1) the number of segments cannot exceed 10; (2) the number of states cannot 
exceed 5; and (3) the number of dot types cannot exceed 4. 

Program ANALYZE produces a line printer listing indicating the file name 
accessed, the range of version numbers used, the basis for selection, and the 
number of files selected by the program. 

Appendix A is a compiled listing of program ANALYZE. Three subroutines are 
required: DATSEL, which is used to input the selection criteria; SELECT, 

which determines if the individual files meet the selection criteria; and 
SELiST, which produces the line printer listing of the information concerning 
the selection process. 

There are six blocks in the main program where code can be written to perform, 
individual analyses using the output files selected by the program. The in- 
serts, labeled 0 through 5, are used as follows. 

0 - Comments concerning the analysis to be performed 

1 “Array specifications and DATA statements necessary for the analysis 

2 - Input of additional data about the processing from input file INPUT.DAT 

3 - Initialization of aggregation arrays before the files are accessed 
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4 - Computations based on a file which has met the selection criteria 

5 - Outputting the final results of the computations (This section is 

located after all files have been checked.) 

The programs described in the following paragraphs are all based on ANALYZE. 

PROGRAM TO DETERMINE ANALYST DOT LABELING ACCURACY - DOTANL 

The program DOTANL uses the data contained in the fourth record of the SPATL 
output files to determine the 'inalyst dot-labeling accuracy. The program 
creates a two-dimensional array; one dimension corresponds to the ground truth 
crop code and the other dimension corresponds to the analyst label. This 
array is loaded with a count of the mutual occurrences of a ground truth crop 
code and an analyst label for all of the analyst-labeled dots in a file. 

DOTANL produces a line printer output with the number of dots which were 
labeled in each of the analyst categories for each ground truth crop code. 

The total number of dots in each crop code and the total number of dots with 
each label are shown. The program can also produce a percentage of correct 
classification for each crop code. 

Input file DATSEL.DAT is set up for the particular criteria required. Input 
file INPUT.DAT has the name of the SPATL files on the first line, and the 
version numbers to be accessed on the second line. Following the second line 
is a set of lines indicating the proper analyst label for each crop code. 

This information is loaded in the form of beginning crop code, ending crop 
code, and the correct analyst label, using FORTRAN format (315). If a partic- 
ular crop code is not included, the percent correct column is left blank. 

When all of the crop codes have been used, a blank line is entered to indi- 
cate an end of data. If the percent correct option is not desired, a blank 
line should follow the line containing the range of version numbers. 

Appendix B is a sample output line printer listing obtained from DOTANL. 
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PROGRAM TO ANALYZE CLUSTER PURITY - CLUANL 


The program CLUANL uses the data contained in the fourth and sixth records of 
the MLTCRP output files to analyze the cluster purity for individual segments 
and to determine overall cluster purity. The program calculates two measures 
of cluster purity based on a two-category ground truth division of the pixels. 
The first measure is the average proportion, which is calculated by the 
following formula: 

Average proportion = 

where 

N is the total number of subpixels in clusters other than the DO/DU cluster 

N^ is the number of subpixels in the Hh cluster 

P^. is the proportion of the majority constituent in the cluster 

The second measure of cluster purity is the average variance, which is cal- 
culated by: 

Average variance = - "■ ) j 

Variables are defined above. 

CLUANL also calculates histograms of the clusters based on small-grains pro- 
portions and of cluster small-grains proportions weighted by pixels. The 
program analyzes cluster labeling accuracy based on three labels for the 
type 1 dot closest to the mean of the cluster: analyst label, the classifier 
label, and the ground truth label. The p’ 'gram determines the number labeled 
as small grains, and those labeled as nonsmall grains for clusters with a 
majority of small grains and for clusters with a majority of nonsmall grains. 
Appendix C shows a typical listing for CLUANL. The program also has the capa- 
bility of printing out the following information about the individual clusters 

a. Cluster number 

b. Number of subpixels in cluster 
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c. Nunter of ground truth crop codes in cluster 

d. Analyst label and location of dot used to label cluster 

e. Classifier label for dot used to label cluster 

f. Ground truth label for dot used to label cluster 

g. Ground truth label and proportion for largest crop code in cluster 

h. Same information for second "largest crop code 
1. Same information for third largest crop code 

j. Same information for fourth largest crop code 

k. Proportion of cluster not in the four largest crop codes 

l. Proportion of either small grains or nonsmall grains, whichever is larger 

m. Majority class (small grains or nonsmall grains) for cluster. 

This printout is currently suppressed, but can be obtained by the removal of 
two comment characters (C) in print statements. 

In order to use Program CLUANL, the DATSEL.DAT input file is set up for the 
particular criteria required, with type 1 dots. Input file INPUT.DAT has the 
name of the MLTCRP files on the first line, and the version numbers to be 
accessed on the second line. Following the second line of the data set is 
the information needed to sort the pixels into small grains or nonsmall grains. 
The information is loaded in the form of beginning crop code, ending crop 
code, and small-grains category. The small-grains cate< ■'ry is a four-digit 
number, of which the first digit is the small-grains class, and the remaining 
three digits are the percentage of small grains in the crop code. This ex- 
plicit percentage is used fcr strip fallow crop codes. 

Program CLUANL can be used to investigate cluster purity for any crop by 
changing the input data set. 
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PROGRAM FOR ANALYZING CLASSIFICATION ACCURACY ~ CLSANL 


Program CLSANL uses the data from records 4, 5, and 6 of the MLTCRP output 

files to determine the small-grains proportions at different stages In the 

Procedure 1 processing. The program makes three passes through the output 

files for type 1, 2, and 3 dots. Therefore, DATSEL.DAT must be the following: 

DOTS 
1 1 

DOTS 
1 2 

DOTS 
1 3 

END 

Input file INPUT.DAT is the same as CLUANL. 

Program CLSANL calculates the following proportions: 

a. Ground truth proportion - Determined from the data in record 5 using the 
transformation In input data set INPUT.DAT. 

b. Uncorrected machine proportion - Calculated from record 5. No threshold 
pixels are considered in determining the proportion. 

c. Bias corrected machine proportion — The uncorrected machine proportion 

is bias corrected us^ng the analyst labels for the type 2 or type 3 dots. 
(The type 3 dots are type 2 dots which were changed by the analyst after 
the classification results were available.) If type 3 dots are not pres- 
ent, type 2 dots are used for the bias correction. 

d. Type 2 dots proportion using classifier labels - Uses the labeled type 2 
dots as a random sample of the segment and calculates a proportion based 
on the classifier label for each dot. 

e. Type 2 dots proportion using ground truth labels - Uses the labeled type 
2 dots as a random sample of the segment and calculates a proportion 
based on the ground truth label for each dot. 
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f. Type 2 dots proportion using analyst labels - Uses the labeled type 2 
dots as a random sample of the segment and calculates a proportion based 
on the analyst label for each dot. 

g. Cluster proportion using analyst labels - The pixels in each cluster are 
sorted on the basis of the analyst label for the type dot used to label 
the cluster, and a proportion determined on this basis. 

h. Cluster proportion using ground truth labels - The pixels in each cluster 
are sorted on the basis of the ground truth label for the type 1 dot used 
to label the cluster, and a proportion determined on this basis. 

i. Machine proportion bias corrected using the ground truth labels for the 
type 2 data - The bias correction is made by comparing the classifier 
labels with the ground truth labels for the type 2 dots. 

Appendix D is the line printer listing obtained from CLSANL. Data contained 
in this listing is also written to a disk file called CLSANL.DAT, which is 
used for automatic plotting of the classification accuracy calculated by the 
program. Both the line printer listing and the output file have the informa- 
tion ordered by state and segment number. 

REFERENCES 

1. Carnes, J. G.: Modification to the Accuracy Assessment Analysis Routine 
SPATE to Produce an Output File. LEC-12175, JSC-14297, June 1978. 

2. Carnes, J. G.: Modification to the Accuracy Assessment Analysis Routine 
MLTCRP to Produce an Output File. LEC-12176, JSC-14298, June 1978. 
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APPENDIX A 

COMPILED LISTING FOR PROGRAM ANALYZE 
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